Bagging Probit Models for Unbalanced Classification
نویسندگان
چکیده
The 11th Pacific-Asia Knowledge Discovery and Data Mining Conference (PAKDD 2007) hosted a data mining competition, co-organized by the Singapore Institute of Statistics. The data set is from a consumer finance company with the aim of finding solutions for a cross-selling business problem. The company currently has two databases, one for credit card holders and the other for home loan (mortgage) customers and they would like to make use of this opportunity to cross-sell home loans to its credit card holders. Thus, it is of their keen interest to have an effective scoring model for predicting potential cross-sell take-ups. The training dataset contains information on 40,700 customers with 40 input variables, most of which are related to the point of application for the company’s credit card, plus a binary target variable indicating the home loan take-up status. This is a sample of customers who opened a new credit card with the company within a specific 2-year period and did not have an existing home loan with the company. The binary target variable has a value of 1 if the customer then opened a home loan with the company within 12 months after opening the credit abstract
منابع مشابه
Improving reservoir rock classification in heterogeneous carbonates using boosting and bagging strategies: A case study of early Triassic carbonates of coastal Fars, south Iran
An accurate reservoir characterization is a crucial task for the development of quantitative geological models and reservoir simulation. In the present research work, a novel view is presented on the reservoir characterization using the advantages of thin section image analysis and intelligent classification algorithms. The proposed methodology comprises three main steps. First, four classes of...
متن کاملA Validation Test Naive Bayesian Classification Algorithm and Probit Regression as Prediction Models for Managerial Overconfidence in Iran's Capital Market
Corporate directors are influenced by overconfidence, which is one of the personality traits of individuals; it may take irrational decisions that will have a significant impact on the company's performance in the long run. The purpose of this paper is to validate and compare the Naive Bayesian Classification algorithm and probit regression in the prediction of Management's overconfident at pre...
متن کاملUnbalance Quantitative Structure Activity Relationship Problem Reduction in Drug Design
Problem statement: Activities of drug molecules can be predicted by Quantitative Structure Activity Relationship (QSAR) models, which overcome the disadvantage of high cost and long cycle by employing traditional experimental methods. With the fact that number of drug molecules with positive activity is rather fewer than that with negatives, it is important to predict molecular activities consi...
متن کاملA Combination of Boosting and Bagging for KDD Cup 2009 - Fast Scoring on a Large Database
We present the ideas and methodologies that we used to address the KDD Cup 2009 challenge on rank-ordering the probability of churn, appetency and up-selling of wireless customers. We choose stochastic gradient boosting tree (TreeNet R ) as our main classifier to handle this large unbalanced dataset. In order to further improve the robustness and accuracy of our results, we bag a series of boos...
متن کاملPrediction of unwanted pregnancies using logistic regression, probit regression and discriminant analysis
Background: Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. Methods : In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were ...
متن کامل